Analysis of GLDS-251 from NASA GeneLab

This R markdown file was auto-generated by the iDEP website Using iDEP 0.91, originally by Steven

Ge SX, Son EW, Yao R: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018, 19(1):534. PMID:30567491

1. Read data

First we set up the working directory to where the files are saved.

 setwd('~/Documents/HTML_R/GLDS251')

R packages and iDEP core Functions. Users can also download the iDEP_core_functions.R file. Many R packages needs to be installed first. This may take hours. Each of these packages took years to develop.So be a patient thief. Sometimes dependencies needs to be installed manually. If you are using an older version of R, and having trouble with package installation, try un-install the current version of R, delete all folders and files (C:/Program Files/R/R-3.4.3), and reinstall from scratch.

 if(file.exists('iDEP_core_functions.R'))
    source('iDEP_core_functions.R') else 
    source('https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/iDEP_core_functions.R') 

We are using the downloaded gene expression file where gene IDs has been converted to Ensembl gene IDs. This is because the ID conversion database is too large to download. You can use your original file if your file uses Ensembl ID, or you do not want to use the pathway files available in iDEP (or it is not available).

 inputFile <- 'GLDS251_Expression.csv'
 sampleInfoFile <- 'GLDS251_Sampleinfo.csv'
 gldsMetadataFile <- 'GLDS251_Metadata.csv'
 geneInfoFile <- 'Arabidopsis_thaliana__athaliana_eg_gene_GeneInfo.csv' #Gene symbols, location etc. 
 geneSetFile <- 'Arabidopsis_thaliana__athaliana_eg_gene.db'  # pathway database in SQL; can be GMT format 
 STRING10_speciesFile <- 'https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/STRING10_species.csv' 

Parameters for reading data

 input_missingValue <- 'geneMedian' #Missing values imputation method
 input_dataFileFormat <- 1  #1- read counts, 2 FKPM/RPKM or DNA microarray
 input_minCounts <- 0.5 #Min counts
 input_NminSamples <- 1 #Minimum number of samples 
 input_countsLogStart <- 4  #Pseudo count for log CPM
 input_CountsTransform <- 1 #Methods for data transformation of counts. 1-EdgeR's logCPM 2-VST, 3-rlog 
readMetadata.out <- readMetadata(gldsMetadataFile)
library(knitr)   #  install if needed. for showing tables with kable
library(kableExtra)
kable( readMetadata.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%")
X0point09G_Rep1 X0point09G_Rep2 X0point09G_Rep3 X0point18G_Rep1 X0point18G_Rep2 X0point18G_Rep3 X0point36G_Rep1 X0point36G_Rep2 X0point36G_Rep3 X0point57G_Rep1 X0point57G_Rep2 X0point57G_Rep3 X0point57G_Rep4 X1G_Rep1 X1G_Rep2 X1G_Rep3 uG_Rep1 uG_Rep2 uG_Rep3 uG_Rep4
Sample.LongId Atha.Ler.0.sl.FLT.0.09G.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.09G.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.09G.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.18G.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.18G.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.18G.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.36G.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.36G.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.36G.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.57G.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.57G.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.57G.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.0.57G.Rep4.RNAseq.RNAseq Atha.Ler.0.sl.FLT.1G.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.1G.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.1G.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.uG.Rep1.RNAseq.RNAseq Atha.Ler.0.sl.FLT.uG.Rep2.RNAseq.RNAseq Atha.Ler.0.sl.FLT.uG.Rep3.RNAseq.RNAseq Atha.Ler.0.sl.FLT.uG.Rep4.RNAseq.RNAseq
Sample.Id Atha.Ler.0.sl.FLT.0.09G.Rep1 Atha.Ler.0.sl.FLT.0.09G.Rep2 Atha.Ler.0.sl.FLT.0.09G.Rep3 Atha.Ler.0.sl.FLT.0.18G.Rep1 Atha.Ler.0.sl.FLT.0.18G.Rep2 Atha.Ler.0.sl.FLT.0.18G.Rep3 Atha.Ler.0.sl.FLT.0.36G.Rep1 Atha.Ler.0.sl.FLT.0.36G.Rep2 Atha.Ler.0.sl.FLT.0.36G.Rep3 Atha.Ler.0.sl.FLT.0.57G.Rep1 Atha.Ler.0.sl.FLT.0.57G.Rep2 Atha.Ler.0.sl.FLT.0.57G.Rep3 Atha.Ler.0.sl.FLT.0.57G.Rep4 Atha.Ler.0.sl.FLT.1G.Rep1 Atha.Ler.0.sl.FLT.1G.Rep2 Atha.Ler.0.sl.FLT.1G.Rep3 Atha.Ler.0.sl.FLT.uG.Rep1 Atha.Ler.0.sl.FLT.uG.Rep2 Atha.Ler.0.sl.FLT.uG.Rep3 Atha.Ler.0.sl.FLT.uG.Rep4
Sample.Name Atha_Ler-0_sl_FLT_0.09G_Rep1 Atha_Ler-0_sl_FLT_0.09G_Rep2 Atha_Ler-0_sl_FLT_0.09G_Rep3 Atha_Ler-0_sl_FLT_0.18G_Rep1 Atha_Ler-0_sl_FLT_0.18G_Rep2 Atha_Ler-0_sl_FLT_0.18G_Rep3 Atha_Ler-0_sl_FLT_0.36G_Rep1 Atha_Ler-0_sl_FLT_0.36G_Rep2 Atha_Ler-0_sl_FLT_0.36G_Rep3 Atha_Ler-0_sl_FLT_0.57G_Rep1 Atha_Ler-0_sl_FLT_0.57G_Rep2 Atha_Ler-0_sl_FLT_0.57G_Rep3 Atha_Ler-0_sl_FLT_0.57G_Rep4 Atha_Ler-0_sl_FLT_1G_Rep1 Atha_Ler-0_sl_FLT_1G_Rep2 Atha_Ler-0_sl_FLT_1G_Rep3 Atha_Ler-0_sl_FLT_uG_Rep1 Atha_Ler-0_sl_FLT_uG_Rep2 Atha_Ler-0_sl_FLT_uG_Rep3 Atha_Ler-0_sl_FLT_uG_Rep4
GLDS 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251 251
Accession GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251 GLDS-251
Hardware EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS EMCS
Tissue Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings Seedlings
Age 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days 6 days
Organism Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana
Ecotype Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0 Ler-0
Genotype WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT WT
Variety Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT Ler-0 WT
Radiation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation Cosmic radiation during centrifugation
Gravity 0.09 0.09 0.09 0.18 0.18 0.18 0.36 0.36 0.36 0.57 0.57 0.57 0.57 1 1 1 uG uG uG uG
Developmental Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings Radical, hypocotyl and cotyledons emerged from seed coat, 6 day old light grown seedlings
Time.series.or.Concentration.gradient Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point
Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light Light
Assay..RNAseq. RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling RNAseq Transcription Profiling
Temperature 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24 22-24
Treatment.type RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight RNAseq analysis of the response of Arabidopsis thaliana to fractional gravity under blue-light stimulation during spaceflight
Treatment.intensity X X X X X X X X X X X X X X X X X X X X
Treament.timing X X X X X X X X X X X X X X X X X X X X
Preservation.Method. Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze Slow MELFI Freeze
 readData.out <- readData(inputFile) 
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
   kable( head(readData.out$data) ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
X0point09G_Rep1 X0point09G_Rep2 X0point09G_Rep3 X0point18G_Rep1 X0point18G_Rep2 X0point18G_Rep3 X0point36G_Rep1 X0point36G_Rep2 X0point36G_Rep3 X0point57G_Rep1 X0point57G_Rep2 X0point57G_Rep3 X0point57G_Rep4 X1G_Rep1 X1G_Rep2 X1G_Rep3 uG_Rep1 uG_Rep2 uG_Rep3 uG_Rep4
AT3G21720 12.92116 14.87339 12.01002 15.26809 11.59451 11.70500 12.39629 14.142332 13.60571 13.07303 13.13999 17.55474 14.05967 15.91151 12.84828 15.80428 12.01343 13.54135 14.31884 14.54563
AT5G38410 16.09011 15.69897 14.97083 13.28652 14.21811 13.64359 14.63662 10.733808 15.04255 15.34144 13.65071 13.26398 15.50062 13.20988 14.07133 16.24863 14.43273 15.46220 14.09612 14.16269
AT2G39730 15.92940 15.18118 14.91456 13.03639 13.83910 13.06756 13.57092 12.636520 14.66682 14.90457 13.41862 13.11109 14.79040 13.21110 13.77269 15.60616 14.25521 15.20823 13.40995 13.87479
AT5G03240 14.52111 14.45147 13.88230 13.15928 14.23806 13.84376 13.63095 14.233389 13.88200 14.27865 13.99042 16.20848 13.78525 13.97874 13.87932 14.09798 14.21643 14.08371 14.29924 14.10608
AT2G34420 12.90513 14.89524 12.94493 12.28236 13.69559 14.44864 14.38435 14.289353 14.03346 15.03109 14.96063 12.88681 14.07747 13.96566 15.26582 14.25997 13.78122 13.41527 13.89635 13.66911
AT3G01500 14.81680 13.80664 13.65943 11.26147 12.62148 11.82167 12.83392 6.948403 13.57781 13.92384 11.95762 11.78382 14.15250 11.47686 12.37785 14.87906 12.94829 14.28886 11.93102 12.57343
 readSampleInfo.out <- readSampleInfo(sampleInfoFile) 
 kable( readSampleInfo.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Gravity
X0point09G_Rep1 009G
X0point09G_Rep2 009G
X0point09G_Rep3 009G
X0point18G_Rep1 018G
X0point18G_Rep2 018G
X0point18G_Rep3 018G
X0point36G_Rep1 036G
X0point36G_Rep2 036G
X0point36G_Rep3 036G
X0point57G_Rep1 057G
X0point57G_Rep2 057G
X0point57G_Rep3 057G
X0point57G_Rep4 057G
X1G_Rep1 1G
X1G_Rep2 1G
X1G_Rep3 1G
uG_Rep1 uG
uG_Rep2 uG
uG_Rep3 uG
uG_Rep4 uG
 input_selectOrg ="NEW" 
 input_selectGO <- NULL     #Gene set category 
 input_noIDConversion = TRUE  
 allGeneInfo.out <- geneInfo(geneInfoFile) 
 converted.out = NULL 
 convertedData.out <- convertedData()    
 nGenesFilter()  
## [1] "16156 genes in 20 samples. 16095  genes passed filter.\n Original gene IDs used."
 convertedCounts.out <- convertedCounts()  # converted counts, just for compatibility 

2. Pre-process

# Read counts per library 
 parDefault = par() 
 par(mar=c(12,4,2,2)) 
 # barplot of total read counts
 x <- readData.out$rawCounts
 groups = as.factor( detectGroups(colnames(x ) ) )
 if(nlevels(groups)<=1 | nlevels(groups) >20 )  
  col1 = 'green'  else
  col1 = rainbow(nlevels(groups))[ groups ]             
         
 barplot( colSums(x)/1e6, 
        col=col1,las=3, main="Total read counts (millions)")  

 readCountsBias()  # detecting bias in sequencing depth 
## [1] 0.4031489
## [1] 0.4031489
## [1] "No bias detected"
 # Box plot 
 x = readData.out$data 
 boxplot(x, las = 2, col=col1,
    ylab='Transformed expression levels',
    main='Distribution of transformed data') 

 #Density plot 
 par(parDefault) 
## Warning in par(parDefault): graphical parameter "cin" cannot be set
## Warning in par(parDefault): graphical parameter "cra" cannot be set
## Warning in par(parDefault): graphical parameter "csi" cannot be set
## Warning in par(parDefault): graphical parameter "cxy" cannot be set
## Warning in par(parDefault): graphical parameter "din" cannot be set
## Warning in par(parDefault): graphical parameter "page" cannot be set
 densityPlot()       

 # Scatter plot of the first two samples 
 plot(x[,1:2],xlab=colnames(x)[1],ylab=colnames(x)[2], 
    main='Scatter plot of first two samples') 

 ####plot gene or gene family
 input_selectOrg ="BestMatch" 
 input_geneSearch <- 'HOXA' #Gene ID for searching 
 genePlot()  
## NULL
 input_useSD <- 'FALSE' #Use standard deviation instead of standard error in error bar? 
 geneBarPlotError()       
## NULL

3. Heatmap

 # hierarchical clustering tree
 x <- readData.out$data
 maxGene <- apply(x,1,max)
 # remove bottom 25% lowly expressed genes, which inflate the PPC
 x <- x[which(maxGene > quantile(maxGene)[1] ) ,] 
 plot(as.dendrogram(hclust2( dist2(t(x)))), ylab="1 - Pearson C.C.", type = "rectangle") 

 #Correlation matrix
 input_labelPCC <- TRUE #Show correlation coefficient? 
 correlationMatrix() 

 # Parameters for heatmap
 input_nGenes <- 1000   #Top genes for heatmap
 input_geneCentering <- TRUE    #centering genes ?
 input_sampleCentering <- FALSE #Center by sample?
 input_geneNormalize <- FALSE   #Normalize by gene?
 input_sampleNormalize <- FALSE #Normalize by sample?
 input_noSampleClustering <- FALSE  #Use original sample order
 input_heatmapCutoff <- 4   #Remove outliers beyond number of SDs 
 input_distFunctions <- 1   #which distant funciton to use
 input_hclustFunctions <- 1 #Linkage type
 input_heatColors1 <- 1 #Colors
 input_selectFactorsHeatmap <- 'Gravity'    #Sample coloring factors 
 png('heatmap.png', width = 10, height = 15, units = 'in', res = 300) 
 staticHeatmap() 
 dev.off()  
## png 
##   2

[heatmap] (heatmap.png)

 heatmapPlotly() # interactive heatmap using Plotly 

4. K-means clustering

 input_nGenesKNN <- 2000    #Number of genes fro k-Means
 input_nClusters <- 4   #Number of clusters 
 maxGeneClustering = 12000
 input_kmeansNormalization <- 'geneMean'    #Normalization
 input_KmeansReRun <- 0 #Random seed 

 distributionSD()  #Distribution of standard deviations 

 KmeansNclusters()  #Number of clusters 

 Kmeans.out = Kmeans()   #Running K-means 
 KmeansHeatmap()   #Heatmap for k-Means 

 #Read gene sets for enrichment analysis 
 sqlite  <- dbDriver('SQLite')
 input_selectGO3 <- 'GOBP'  #Gene set category
 input_minSetSize <- 15 #Min gene set size
 input_maxSetSize <- 2000   #Max gene set size 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO3,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 # Alternatively, users can use their own GMT files by
 #GeneSets.out <- readGMTRobust('somefile.GMT')  
 results <- KmeansGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 3.72e-84 248 Response to abiotic stimulus
3.72e-84 168 Response to inorganic substance
1.02e-71 115 Response to metal ion
1.19e-69 100 Response to cadmium ion
7.98e-61 120 Response to osmotic stress
5.92e-60 113 Response to salt stress
4.81e-59 141 Cellular amide metabolic process
2.26e-58 130 Peptide metabolic process
2.34e-58 203 Response to organic substance
7.88e-55 184 Organonitrogen compound biosynthetic process
B 7.89e-29 70 Response to abiotic stimulus
4.11e-20 44 Defense response
4.15e-18 49 Response to oxygen-containing compound
1.20e-17 38 Response to external biotic stimulus
1.20e-17 38 Response to other organism
1.60e-17 38 Response to biotic stimulus
5.71e-17 41 Response to acid chemical
5.71e-17 37 Response to inorganic substance
7.31e-16 42 Response to external stimulus
2.67e-15 31 Defense response to other organism
C 8.63e-28 86 Catabolic process
8.56e-26 77 Organic substance catabolic process
1.08e-24 74 Cellular catabolic process
6.61e-24 92 Response to abiotic stimulus
2.48e-21 73 Cellular response to chemical stimulus
3.52e-20 67 Response to external stimulus
6.31e-20 74 Response to hormone
1.51e-19 74 Response to endogenous stimulus
1.62e-19 81 Response to organic substance
1.85e-18 75 Small molecule metabolic process
D 2.10e-121 94 Photosynthesis
3.82e-68 53 Photosynthesis, light reaction
1.63e-51 64 Generation of precursor metabolites and energy
6.34e-42 104 Response to abiotic stimulus
9.22e-38 90 Organonitrogen compound biosynthetic process
2.40e-36 26 Photosynthetic electron transport chain
6.63e-33 57 Response to light stimulus
4.97e-32 57 Response to radiation
2.53e-31 76 Oxidation-reduction process
1.51e-29 20 Photosynthesis, light harvesting
 input_seedTSNE <- 0    #Random seed for t-SNE
 input_colorGenes <- TRUE   #Color genes in t-SNE plot? 
 tSNEgenePlot()  #Plot genes using t-SNE 

5. PCA and beyond

 input_selectFactors <- 'Sample_Name'   #Factor coded by color
 input_selectFactors2 <- 'Sample_Name'  #Factor coded by shape
 input_tsneSeed2 <- 0   #Random seed for t-SNE 
 #PCA, MDS and t-SNE plots
 PCAplot()  

 MDSplot() 

 tSNEplot()  

 #Read gene sets for pathway analysis using PGSEA on principal components 
 input_selectGO6 <- 'GOBP' 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO6,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 PCApathway() # Run PGSEA analysis 
## Warning: Package 'KEGG.db' is deprecated and will be removed from Bioconductor
##   version 3.12

 cat( PCA2factor() )   #The correlation between PCs with factors 

6. DEG1

 input_CountsDEGMethod <- 3 #DESeq2= 3,limma-voom=2,limma-trend=1 
 input_limmaPval <- 0.1 #FDR cutoff
 input_limmaFC <- 2 #Fold-change cutoff
 input_selectModelComprions <- c('Gravity: 1G vs. 009G','Gravity: 1G vs. 018G','Gravity: 1G vs. 036G','Gravity: 1G vs. 057G','Gravity: 1G vs. uG')  #Selected comparisons
 input_selectFactorsModel <- 'Gravity'  #Selected comparisons
 input_selectInteractions <- NULL   #Selected comparisons
 input_selectBlockFactorsModel <- NULL  #Selected comparisons
 factorReferenceLevels.out <- c('Gravity:1G') 

 limma.out <- limma()
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
 DEG.data.out <- DEG.data()
 limma.out$comparisons 
## [1] "1G-009G" "1G-018G" "1G-036G" "1G-057G" "1G-uG"
 input_selectComparisonsVenn = limma.out$comparisons[1:3] # use first three comparisons
 input_UpDownRegulated <- FALSE #Split up and down regulated genes 
 vennPlot() # Venn diagram 

  sigGeneStats() # number of DEGs as figure 

  sigGeneStatsTable() # number of DEGs as table 
##          Comparisons  Up Down
## X1G-009G    X1G-009G 125   12
## X1G-018G    X1G-018G   5    2
## X1G-036G    X1G-036G   0    0
## X1G-057G    X1G-057G   1    0
## X1G-uG        X1G-uG   4    2

7. DEG2

 input_selectContrast <- '1G-009G'  #Selected comparisons 
 selectedHeatmap.data.out <- selectedHeatmap.data()
 selectedHeatmap()   # heatmap for DEGs in selected comparison
## Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
 # Save gene lists and data into files
 write.csv( selectedHeatmap.data()$genes, 'heatmap.data.csv') 
 write.csv(DEG.data(),'DEG.data.csv' )
 write(AllGeneListsGMT() ,'AllGeneListsGMT.gmt')
 input_selectGO2 <- 'GOBP'  #Gene set category 
 geneListData.out <- geneListData()  
 volcanoPlot()  

  scatterPlot()  

  MAplot()  

  geneListGOTable.out <- geneListGOTable()  
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO2,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_removeRedudantSets <- TRUE   #Remove highly redundant gene sets? 
 results <- geneListGO()  #Enrichment analysis
## Error in if (dim(results1)[2] == 1) return(results1) else {: argument is of length zero
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 3.72e-84 248 Response to abiotic stimulus
3.72e-84 168 Response to inorganic substance
1.02e-71 115 Response to metal ion
1.19e-69 100 Response to cadmium ion
7.98e-61 120 Response to osmotic stress
5.92e-60 113 Response to salt stress
4.81e-59 141 Cellular amide metabolic process
2.26e-58 130 Peptide metabolic process
2.34e-58 203 Response to organic substance
7.88e-55 184 Organonitrogen compound biosynthetic process
B 7.89e-29 70 Response to abiotic stimulus
4.11e-20 44 Defense response
4.15e-18 49 Response to oxygen-containing compound
1.20e-17 38 Response to external biotic stimulus
1.20e-17 38 Response to other organism
1.60e-17 38 Response to biotic stimulus
5.71e-17 41 Response to acid chemical
5.71e-17 37 Response to inorganic substance
7.31e-16 42 Response to external stimulus
2.67e-15 31 Defense response to other organism
C 8.63e-28 86 Catabolic process
8.56e-26 77 Organic substance catabolic process
1.08e-24 74 Cellular catabolic process
6.61e-24 92 Response to abiotic stimulus
2.48e-21 73 Cellular response to chemical stimulus
3.52e-20 67 Response to external stimulus
6.31e-20 74 Response to hormone
1.51e-19 74 Response to endogenous stimulus
1.62e-19 81 Response to organic substance
1.85e-18 75 Small molecule metabolic process
D 2.10e-121 94 Photosynthesis
3.82e-68 53 Photosynthesis, light reaction
1.63e-51 64 Generation of precursor metabolites and energy
6.34e-42 104 Response to abiotic stimulus
9.22e-38 90 Organonitrogen compound biosynthetic process
2.40e-36 26 Photosynthetic electron transport chain
6.63e-33 57 Response to light stimulus
4.97e-32 57 Response to radiation
2.53e-31 76 Oxidation-reduction process
1.51e-29 20 Photosynthesis, light harvesting

STRING-db API access. We need to find the taxonomy id of your species, this used by STRING. First we try to guess the ID based on iDEP’s database. Users can also skip this step and assign NCBI taxonomy id directly by findTaxonomyID.out = 10090 # mouse 10090, human 9606 etc.

 STRING10_species = read.csv(STRING10_speciesFile)  
 ix = grep('Arabidopsis thaliana', STRING10_species$official_name ) 
 findTaxonomyID.out <- STRING10_species[ix,1] # find taxonomyID
 findTaxonomyID.out  
## [1] 3702

Enrichment analysis using STRING

  STRINGdb_geneList.out <- STRINGdb_geneList() #convert gene lists
 input_STRINGdbGO <- 'Process'  #'Process', 'Component', 'Function', 'KEGG', 'Pfam', 'InterPro' 
 results <- stringDB_GO_enrichmentData()  # enrichment using STRING 
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
x
NULL

PPI network retrieval and analysis

 input_nGenesPPI <- 100 #Number of top genes for PPI retrieval and analysis 
 stringDB_network1(1) #Show PPI network 

Generating interactive PPI

 write(stringDB_network_link(), 'PPI_results.html') # write results to html file 
## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")

## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")

## Warning: 'string_db$get_link' is deprecated.
## Use 'Contact developers to request functionality' instead.
## See help("Deprecated")
 browseURL('PPI_results.html') # open in browser 

8. Pathway analysis

 input_selectContrast1 = limma.out$comparisons[1] 
 #input_selectContrast1 = limma.out$comparisons[3] # manually set 
 input_selectGO = 'GOBP'  # gene set category 
 #input_selectGO='custom' # if custom gmt file
 input_minSetSize <- 15 #Min size for gene set
 input_maxSetSize <- 2000   #Max size for gene set 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_pathwayPvalCutoff <- 0.2 #FDR cutoff
 input_nPathwayShow <- 30   #Top pathways to show
 input_absoluteFold <- FALSE    #Use absolute values of fold-change?
 input_GenePvalCutoff <- 1  #FDR to remove genes 

 input_pathwayMethod = 1  # 1  GAGE
 gagePathwayData.out <- gagePathwayData()  # pathway analysis using GAGE  
   
 results <- gagePathwayData.out  #Enrichment analysis for k-Means clusters  
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GAGE analysis: 1G vs 009G statistic Genes adj.Pval
Down Photosynthesis -7.7332 222 6.7e-11
Photosynthesis, light reaction -7.5326 119 7.2e-10
Photosynthetic electron transport chain -5.1818 46 5.0e-04
Plastid organization -4.7996 253 5.0e-04
Photosystem II assembly -4.4945 25 1.3e-02
Chlorophyll metabolic process -4.042 80 1.3e-02
Chlorophyll biosynthetic process -3.8939 57 2.3e-02
Porphyrin-containing compound metabolic process -3.7403 91 2.4e-02
Tetrapyrrole metabolic process -3.7347 92 2.4e-02
NcRNA metabolic process -3.6851 423 2.4e-02
Chloroplast organization -3.5641 196 3.5e-02
Up Response to chitin 5.558 103 1.2e-04
Response to organonitrogen compound 4.5979 211 2.0e-03
Cellular response to hypoxia 4.4878 171 2.0e-03
Response to hypoxia 4.4674 193 2.0e-03
Cellular response to decreased oxygen levels 4.4164 172 2.0e-03
Cellular response to oxygen levels 4.4164 172 2.0e-03
Response to decreased oxygen levels 4.3794 196 2.0e-03
Response to oxygen levels 4.3773 197 2.0e-03
Response to water deprivation 4.122 294 4.5e-03
Response to water 4.0702 298 4.8e-03
Response to drug 4.0503 471 4.8e-03
Secondary metabolite biosynthetic process 3.7363 111 1.9e-02
External encapsulating structure organization 3.6243 427 2.2e-02
Response to nitrogen compound 3.585 267 2.4e-02
Secondary metabolic process 3.5712 254 2.4e-02
Response to temperature stimulus 3.523 486 2.6e-02
Response to wounding 3.4929 148 3.0e-02
Polysaccharide metabolic process 3.4752 366 3.0e-02
Ethylene-activated signaling pathway 3.4507 137 3.3e-02
 pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
 input_pathwayMethod = 3  # 1  fgsea 
 fgseaPathwayData.out <- fgseaPathwayData() #Pathway analysis using fgsea 
## Warning in fgsea(pathways = gmt, stats = fold, minSize = input_minSetSize, :
## You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To
## run fgseaMultilevel, you need to remove the nperm argument in the fgsea function
## call.
 results <- fgseaPathwayData.out  #Enrichment analysis for k-Means clusters 
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Direction GSEA analysis: 1G vs 009G NES Genes adj.Pval
Down Photosynthesis, light reaction -2.658 119 7.5e-03
Photosynthesis -2.6137 222 8.3e-03
Photosynthetic electron transport chain -2.4056 46 7.4e-03
Photosystem II assembly -2.1641 25 7.4e-03
Plastid membrane organization -2.072 47 7.4e-03
Chlorophyll metabolic process -2.0552 80 7.5e-03
Protein-chromophore linkage -2.0511 39 7.4e-03
Reductive pentose-phosphate cycle -2.0272 16 9.7e-03
Photosynthesis, dark reaction -2.0272 16 9.7e-03
Chlorophyll biosynthetic process -2.0271 57 7.4e-03
Tetrapyrrole metabolic process -1.9699 92 7.5e-03
Porphyrin-containing compound metabolic process -1.968 91 7.5e-03
Plastid organization -1.962 253 8.7e-03
Up Response to chitin 2.4804 103 5.9e-03
Cellular response to hypoxia 2.2021 171 5.9e-03
Response to organonitrogen compound 2.1986 211 5.9e-03
Cellular response to decreased oxygen levels 2.1897 172 5.9e-03
Cellular response to oxygen levels 2.1897 172 5.9e-03
Response to hypoxia 2.1745 193 5.9e-03
Response to oxygen levels 2.1609 197 5.9e-03
Response to decreased oxygen levels 2.1594 196 5.9e-03
Response to wounding 2.0894 148 5.9e-03
Secondary metabolite biosynthetic process 2.0468 111 5.9e-03
Ethylene-activated signaling pathway 2.0232 137 5.9e-03
Toxin metabolic process 1.9966 54 8.1e-03
Response to water deprivation 1.9891 294 5.9e-03
Response to water 1.9697 298 5.9e-03
S-glycoside metabolic process 1.9636 89 5.9e-03
Glycosinolate metabolic process 1.9636 89 5.9e-03
Glucosinolate metabolic process 1.9636 89 5.9e-03
  pathwayListData.out = pathwayListData() 
 enrichmentPlot(pathwayListData.out, 25  ) 

  enrichmentNetwork(pathwayListData.out )  

  enrichmentNetworkPlotly(pathwayListData.out) 

   PGSEAplot() # pathway analysis using PGSEA 
## 
## Computing P values using ANOVA

9. Chromosome

 input_selectContrast2 = limma.out$comparisons[1] 
 #input_selectContrast2 = limma.out$comparisons[3] # manually set
 input_limmaPvalViz <- 0.1  #FDR to filter genes
 input_limmaFCViz <- 2  #FDR to filter genes 
 genomePlotly() # shows fold-changes on the genome 
## Warning in eval(quote(list(...)), env): NAs introduced by coercion
## Warning in genomePlotly(): NAs introduced by coercion

10. Biclustering

 input_nGenesBiclust <- 1000    #Top genes for biclustering
 input_biclustMethod <- 'BCCC()'    #Method: 'BCCC', 'QUBIC', 'runibic' ... 
 biclustering.out = biclustering()  # run analysis

 input_selectBicluster <- 1 #select a cluster 
 biclustHeatmap()   # heatmap for selected cluster 

 input_selectGO4 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO4,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 results <- geneListBclustGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
3.9e-126 299 Response to abiotic stimulus
7.1e-92 176 Response to inorganic substance
5.6e-81 235 Response to organic substance
1.9e-74 93 Photosynthesis
1.0e-73 194 Oxidation-reduction process
1.9e-71 202 Response to hormone
6.1e-71 203 Response to endogenous stimulus
1.1e-67 206 Small molecule metabolic process
1.4e-66 110 Response to metal ion
1.2e-61 93 Response to cadmium ion

11. Co-expression network

 input_mySoftPower <- 5 #SoftPower to cutoff
 input_nGenesNetwork <- 1000    #Number of top genes
 input_minModuleSize <- 20  #Module size minimum 
 wgcna.out = wgcna()   # run WGCNA  
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.4600  2.020          0.908  370.00    375.00  528.0
## 2      2   0.0848  0.317          0.774  194.00    192.00  339.0
## 3      3   0.0620 -0.180          0.753  119.00    113.00  243.0
## 4      4   0.3830 -0.436          0.849   80.10     71.60  188.0
## 5      5   0.5810 -0.623          0.826   57.70     48.70  156.0
## 6      6   0.7220 -0.785          0.808   43.60     34.80  133.0
## 7      7   0.7550 -0.901          0.769   34.10     25.80  118.0
## 8      8   0.7820 -1.030          0.752   27.50     19.90  106.0
## 9      9   0.8140 -1.080          0.766   22.70     15.50   95.8
## 10    10   0.7900 -1.130          0.731   19.10     12.60   87.6
## 11    12   0.8000 -1.160          0.756   14.10      8.25   74.7
## 12    14   0.7940 -1.150          0.765   10.90      5.55   64.8
## 13    16   0.7880 -1.150          0.784    8.73      3.92   57.0
## 14    18   0.8010 -1.140          0.824    7.16      2.90   50.5
## 15    20   0.8520 -1.150          0.879    5.99      2.18   45.6
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
 softPower()  # soft power curve 

  modulePlot()  # plot modules  

  listWGCNA.Modules.out = listWGCNA.Modules() #modules
 input_selectGO5 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO5,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_selectWGCNA.Module <- 'Entire network'   #Select a module
 input_topGenesNetwork <- 10    #SoftPower to cutoff
 input_edgeThreshold <- 0.4 #Number of top genes 
 moduleNetwork()    # show network of top genes in selected module
##  softConnectivity: FYI: connecitivty of genes with less than 7 valid samples will be returned as NA.
##  ..calculating connectivities..

 input_removeRedudantSets <- TRUE   #Remove redundant gene sets 
 results <- networkModuleGO()  #Enrichment analysis of selected module
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
3.9e-126 299 Response to abiotic stimulus
7.1e-92 176 Response to inorganic substance
5.6e-81 235 Response to organic substance
1.9e-74 93 Photosynthesis
1.0e-73 194 Oxidation-reduction process
1.9e-71 202 Response to hormone
6.1e-71 203 Response to endogenous stimulus
1.1e-67 206 Small molecule metabolic process
1.4e-66 110 Response to metal ion
1.2e-61 93 Response to cadmium ion